9 research outputs found

    A pragmatic investigation of the translation of swearwords in Arabic-English film subtitling

    Get PDF
    Subtitling, which is a main type of Audiovisual translation (AVT), has only recently received considerable interest in the field of translation studies. As far as the Arabic language is concerned, most studies have been conducted to discuss several problems in subtitling English multimedia material including films into Arabic by amateur or professional subtitlers, while few studies have been carried out to investigate translation problems in subtitling Arabic films into English, especially the problem of translating Arabic swearwords on screen. Swearwords are culturally laden expressions and therefore pose a challenge to film subtitlers who deal with a variety of such expressions in different contexts. In an endeavour to address this gap in that area of AVT, this study attempts to investigate one of the culture-specific problems in Arabic-English film subtitling, namely swearwords. To achieve this goal, a corpus consisting of three Arabic films subtitled into English on an Egyptian TV channel was collected to investigate the translation of these culture-bound expressions. The analysis of the data was carried out utilising Baker’s model of pragmatic equivalence. In this framework, the study sheds light on the strategies used by subtitlers to render the swearwords in Arabic films into English on the TV screen. The results of the analysis show that source language (SL) swearwords are either toned down using euphemistic expressions or completely omitted in the target language (TL) due to ideological and cultural considerations. Moreover, the study reveals that while some SL swearwords are pragmatically translated into their equivalents in the TL, some others do not have the same pragmatic equivalents

    Lexical selection for machine translation

    Get PDF
    Current research in Natural Language Processing (NLP) tends to exploit corpus resources as a way of overcoming the problem of knowledge acquisition. Statistical analysis of corpora can reveal trends and probabilities of occurrence, which have proved to be helpful in various ways. Machine Translation (MT) is no exception to this trend. Many MT researchers have attempted to extract knowledge from parallel bilingual corpora. The MT problem is generally decomposed into two sub-problems: lexical selection and reordering of the selected words. This research addresses the problem of lexical selection of open-class lexical items in the framework of MT. The work reported in this thesis investigates different methodologies to handle this problem, using a corpus-based approach. The current framework can be applied to any language pair, but we focus on Arabic and English. This is because Arabic words are hugely ambiguous and thus pose a challenge for the current task of lexical selection. We use a challenging Arabic-English parallel corpus, containing many long passages with no punctuation marks to denote sentence boundaries. This points to the robustness of the adopted approach. In our attempt to extract lexical equivalents from the parallel corpus we focus on the co-occurrence relations between words. The current framework adopts a lexicon-free approach towards the selection of lexical equivalents. This has the double advantage of investigating the effectiveness of different techniques without being distracted by the properties of the lexicon and at the same time saving much time and effort, since constructing a lexicon is time-consuming and labour-intensive. Thus, we use as little, if any, hand-coded information as possible. The accuracy score could be improved by adding hand-coded information. The point of the work reported here is to see how well one can do without any such manual intervention. With this goal in mind, we carry out a number of preprocessing steps in our framework. First, we build a lexicon-free Part-of-Speech (POS) tagger for Arabic. This POS tagger uses a combination of rule-based, transformation-based learning (TBL) and probabilistic techniques. Similarly, we use a lexicon-free POS tagger for English. We use the two POS taggers to tag the bi-texts. Second, we develop lexicon-free shallow parsers for Arabic and English. The two parsers are then used to label the parallel corpus with dependency relations (DRs) for some critical constructions. Third, we develop stemmers for Arabic and English, adopting the same knowledge -free approach. These preprocessing steps pave the way for the main system (or proposer) whose task is to extract translational equivalents from the parallel corpus. The framework starts with automatically extracting a bilingual lexicon using unsupervised statistical techniques which exploit the notion of co-occurrence patterns in the parallel corpus. We then choose the target word that has the highest frequency of occurrence from among a number of translational candidates in the extracted lexicon in order to aid the selection of the contextually correct translational equivalent. These experiments are carried out on either raw or POS-tagged texts. Having labelled the bi-texts with DRs, we use them to extract a number of translation seeds to start a number of bootstrapping techniques to improve the proposer. These seeds are used as anchor points to resegment the parallel corpus and start the selection process once again. The final F-score for the selection process is 0.701. We have also written an algorithm for detecting ambiguous words in a translation lexicon and obtained a precision score of 0.89.EThOS - Electronic Theses Online ServiceEgyptian GovernmentGBUnited Kingdo

    Towards Corpus-Based Stemming for Arabic Texts

    Get PDF
    Stemming is an essential processing step in a number of natural language processing (NLP) applications such as information extraction, text analysis and machine translation. It is the process of reducing words to their stems. This paper presents a light stemmer for Arabic, using a corpus-based approach. The stemmer groups morphological variants of words in an Arabic corpus based on shared characters, before stripping off their affixes (prefixes and suffixes) to produce their common stem. Experimental results show that 86% of words in the test set were correctly grouped under a similar reduced form (i.e. the possible stem). In some cases the reduced form is not the legitimate stem. The evaluation shows that 72.2% of the words in the test set were reduced to their legitimate stem. The current stemmer is developed with the future aim of investigating the effectiveness of using word stems for extracting bilingual equivalents from an Arabic-English parallel corpus

    Towards Corpus-Based Stemming for Arabic Texts

    Get PDF
    Stemming is an essential processing step in a number of natural language processing (NLP) applications such as information extraction, text analysis and machine translation. It is the process of reducing words to their stems. This paper presents a light stemmer for Arabic, using a corpus-based approach. The stemmer groups morphological variants of words in an Arabic corpus based on shared characters, before stripping off their affixes (prefixes and suffixes) to produce their common stem. Experimental results show that 86% of words in the test set were correctly grouped under a similar reduced form (i.e. the possible stem). In some cases the reduced form is not the legitimate stem. The evaluation shows that 72.2% of the words in the test set were reduced to their legitimate stem. The current stemmer is developed with the future aim of investigating the effectiveness of using word stems for extracting bilingual equivalents from an Arabic-English parallel corpus

    Earth Fissures in Wadi Najran, Kingdom of Saudi Arabia

    No full text
    The formation of earth fissures due to groundwater depletion has been reported in many places in North America, Europe, and Asia. Najran Basin is in the southern part of the Kingdom of Saudi Arabia, and agricultural activities and other groundwater uses have caused significant groundwater depletion there. The basin recently experienced a sudden appearance of numerous earth fissures. An interdisciplinary study consisting of an evaluation of land-use changes, and hydrological, hydrogeological, and geophysical investigations was conducted to determine the reason for the formation of the earth fissures. The hydrological analysis strongly revealed that the groundwater level is decreasing with time. Groundwater depletion would lead to the accumulation of subsurface stress, causing soil hydro-consolidation which creates the ideal condition for the formation of earth fissures. Electrical resistivity, data indicated that there are anomalies in the profiles, which are most probably due to the presence of subsurface topography, another key factor for the formation of the earth fissures
    corecore